Empirical Study of Software Metrics

 

Madhuri Gupta1, Dr. Arvind Kalia2

1Department of BCA, NSCBM Govt. College, Hamirpur-177005, Himachal Pradesh,

2Department of Computer science, Himachal Pradesh University, Shimla-171005

*Corresponding Author E-mail: drmundhada@yahoo.com

 

 

ABSTRACT:

Software metrics are applied to get comparable values of the degree of particular software quality features. These are key “facts” that test managers use to understand their current position and to prioritize their activities to reduce the risk of schedule over-runs on software releases and can control software projects better. Metrics helps to measure current performance. This empirical study was conducted in various IT organizations. The objective of this study is investigation of frequently used software metrics in the IT organizations nowadays. Based on the detailed investigation, these are divided into four main groups: organization metrics, project metrics, process metrics and product metrics. Product metrics are used often in IT Industry now days.

 

KEYWORDS: Software Testing, Software Metrics, empirical study, traceability metrics.

 

1. INTRODUCTION:

In a software development project, errors can be injected at any stage during development. For each phase, there are different techniques to detect and eliminate errors that originate in that phase. However no technique is perfect and it is expected that some of the errors of the earlier phases will finally manifest themselves in the code. Ultimately, these remaining errors will be reflected in the code. Testing is the phase where the errors remaining from all the previous phases must be detected. Hence, it performs a very critical role for quality assurance and for ensuring the reliability of software [1]. A variety of methods and techniques are available for testing different software quality features. Testing provides an objective view of the software with respect to the context in which it is operated [2]. Usually testing methods are categorized as white-box or black box testing. White-box testing, sometimes called glass-box testing guarantees that all independent paths within a module have been exercised at least once, Exercise all logical decisions on their true and false sides, Execute all loops at their boundaries and within their operational bounds, Exercise internal data structures to ensure their validity [3]. It permits to examine the internal structure of the program. The knowledge to the internal structure of the code can be used to find the number of test cases required to guarantees a given level of test coverage whereas Black box testing focuses on the functional requirements of the software [4]. Test cases are designed to test functional validity, system behavior and performance. Similar test cases, such as regression tests or certain specific functionality tests, can be grouped together into a test suite. In practice, it is not possible to test systems exhaustively, so test cases must be diversified and they must be designed accurately. Different Test coverage criteria, such as statement coverage, branch coverage, path coverage, mutation adequacy, and interface-based criteria, have been presented [5]. To prevent overlap and repetition between development life cycle phases, each phase goes through various levels of testing. First level is unit testing, a process of taking a module and running it in isolation from the rest of the software product  by using prepared test cases and comparing the actual results with the results predicted by the specifications and design of the module. Second level is Integration testing in which modules tested in the previous phase are integrated into bigger modules and then tested. At the system testing level, the purpose is to compare the system to its original objectives. Last level i.e. acceptance testing tests whether the product meets customer’s and user’s needs. Acceptance testing can be conducted by the end users of the system in a real environment(beta testing) or at the developer’s site(alpha testing)[6]. Software metrics are good measures for guiding the selection of testing techniques and help to measure current performance. This empirical study is aimed at analyzing various software metrics and characterizes them according to use. This study was conducted in various top IT Companies, and observations obtained from the study helps to identify most frequent used metrics now days.

 

2.SOFTWARE METRICS:

Software Metrics are quantifiable measures that could be used to measure different characteristics of software. Paul Goodman [7] defines software metrics as “the continuous application of measurement based techniques to the software development process and its products to supply meaningful and timely management information, together with the use of those techniques to improve that process and its products. G. Gordon Schulmeyer define Metrics as “A quantitative measure of the degree to which a system, component or process possesses a given attribute”. Fenton and Pfleeger [8] formally define measurement as mapping from the empirical world to the formal, relational world. Consequently, a measure is the number or symbol assigned to an entity by this mapping in order to characterize an attribute. software metrics are an important indicator of the effectiveness of a software testing process. They can be added/deleted/modified on the basis of need but its reports must be accessible to everyone. The first step in establishing metrics is to identify the key software testing processes that can be objectively measured. This information can be used as the baseline to define the metric(s) and to determine what information will be tracked, who will track the information and at which frequency. Then the processes necessary to effectively track, calculate manage and interpret the defined metrics must be implemented. Areas for process improvement can be identified based on the interpretation of the defined metrics. Software metrics provides a basis for estimation and facilitates planning for closure of the performance gap [9]. They identify risk areas that require more testing and help to resolve potential problems and provide areas of improvement and hence an objective measure of the effectiveness and efficiency of Metrics should be decided on the basis of their importance to stakeholders rather than ease of data collection. Metrics that are not of interest to stakeholders must be avoided and complex data must be handled carefully. The software metrics life cycle involves recognizing the metric, prioritizing them, classifying metrics that may be project specific, identifying data required for the metric, if data is not available, identify/setup process to capture the data, communicating the stakeholders, capturing and verifying data, analyzing and processing data. While recognizing the correct metrics, one should identify the goals or problem areas where improvement is required and can refine the goals, using the “Goal-Question-Metric”(GQM), an excellent technique for selecting appropriate metrics to meet his/her needs. This approach is formally known as Goal-Question–metric paradigm, in which each project has a set of goals. Each of these goals has a set of questions that a manager may ask to help understand if each project is achieving its goals. So it is a systematic approach for integrating goals to a process, identifying metrics relevant to process improvement and tailored to organization. It works like as shown in Fig 2:

 

Figure 2: Goal Question Metric Approach

 

After detailed investigation on software metrics, these are mainly categorized into organizational, project, process and product metrics. Based on data capturing methodology as depicted in Table1, metrics can be classified as base metrics or derived metrics. Base metrics are metrics for which data can be captured directly (time, effort, defects, time execution details, etc.). Derived metrics are derived from base metrics (like productivity, quality etc.) Productivity can be evaluated as: Manual Testing – No. of test cases/ Total effort or Automation testing – Total SLOC/ Total effort.

 

2.1 ORGANIZATIONAL METRICS:

Metrics at the level of organization are useful in overall project planning and management. Assessment of the effectiveness of the software testing process has to rely on appropriate measures. It allows senior management to monitor the overall strength of organization and points to area of weakness. Thus, these metrics help senior management in setting new goals and plan for resources needed to realize these goals. Measures that are embedded in the organizational level test strategy makes the underlying testing process activities visible, enabling the managers and engineers to better acknowledge the connections among various process activities includes testing cost per KLOC(per thousand lines of code), delivery schedule slippage, and time to complete system testing [8].One might say “The number of defects reported in the field over all products and within 3 months of their shipping, has dropped from 0.2 defects per thousands line of code (KLOC) to 0.04 defects per KLOC.” Line of Code (LOC) is one of the measures of length of code the software engineer writes to deliver software requirement. It is the simplest way to measure the size of a program i.e. to count the lines. This is the oldest and most widely used size metric It has the advantage of being easily recognizable, seen and therefore counted. Although this may seem to be a simple metric that can be counted algorithmically, there is no general agreement about what constitutes a line of code [1].The quality of comments materially affects maintenance costs because maintenance person will depend on the comments more than anything else to do the job. Conversely, too many blank lines and comments with poor readability and understandability will increase maintenance effort. The problem with including comments is that it must be able to distinguish between useful and useless comments, and there is no rigorous way to do that. So it is always advisable not to consider comments and blank lines while counting for LOC. Line counts are notorious in that they can vary between programming languages and coding styles [11].

 

2.2 PROJECT METRICS:

Project metrics are useful in the monitoring and control of a specific project. The ratio of actual to planned system test effort is one project metric. Test effort could be measured in terms of tester-man-months. At the start of the system test phase, for example, the project manager estimates the total system test effort. The ratio of actual to estimated effort is zero prior to the system test phase. This ratio builds up over time and tracking this ratio assists the project manager in allocating testing resources. Another project metric is ratio of number of successful tests to total number of tests in system test phase. At any time during the project, the evolution of this ratio from the start of the project could be used to estimate the time remaining to complete the system test process [9].Project metrics are used to describe the project characteristics and execution [10]. Examples are: Number of software developers, Staffing pattern over the life cycle of the software, Cost and schedule, Productivity.

 

 

2.3 PROCESS METRICS:

Every project use some test processes. Several other well organized processes exist. The goal of process testing metrics is to access the goodness of the process. When a test process consists of several phases, for example unit test, integration test and system test; one can measure how many defects were found in each phase. It is well known that the later a defect is found, the costlier it is to fix. Hence, a metric that classifies defects according to the phase in which they are found assists in evaluating the process itself. It describes the effectiveness and quality of the process that produces the software product, and improves software development and maintenance. Examples are: Effort required in the process, Time to produce the product, Effectiveness of defect removal during development, Number of defects found during testing, Maturity of process. One process metrics often used in IT industries is Defect Removal Efficiency (DRE). It is computed by dividing the effort required for defect detection, defect resolution time and retesting time by the number of remarks. This is calculated per test type, during and across test phases. It is a quality metric that provides benefit at both the project and process level .In essence, DRE is a measure of the filtering ability of quality assurance and control activities as they are applied throughout all process framework activities [11].When considered for a project as a whole, DRE is defined in the following manner:

DRE = E/ (E + D)

Where E is the number of errors found before delivery of the software to the end-user and D is the number of defects found after delivery.

 

 

The ideal value for DRE is 1, means no defects are found in the software. Realistically, D will be greater than 0, but the value of DRE can still approach 1. As E increases (for a given value of D), the overall value of DRE begins to approach 1. In fact, as E increases, it is likely that the final value of D will decrease (errors are filtered out before they become defects). If used as a metric that provides an indicator of the filtering ability of quality control and assurance activities, DRE encourages a software project team to institute techniques for finding as many errors as possible before delivery.

 

2.4 PRODUCT METRICS:

Product metrics relate to specific product that describe its characteristics such as size, complexity, design features, performance, and quality level. It quantitatively characterizes some aspects of the structure of a software product, such as a requirements specification, a design, or source code also known as complexity metrics [12]. Product metrics are classified as static metrics or dynamic metrics. For example, the average number of testers working on a project is a static project metric. Number of defects remaining to be fixed could be treated as dynamic metric as it can be computed accurately only after a code change has been made and the product retested [13].Two types of generic product metrics are Cyclomatic Complexity and Halstead Metrics for Software Testing. Thomas J. McCabe in 1976 proposed that program complexity be measured by the cyclomatic number of the programs flow graph [1]. It directly measures the number of linearly independent paths through a program's source code. Studies have shown that code complexity correlates strongly with program size measured by lines of code and is an indication of the extent to which control flow is used. Cyclomatic Complexity is a useful metric for predicting those modules that are likely to be error prone. It can be used for test planning as well as test case design.  It is a graph theoretical based concept (Control Flow Graph). For a graph G with n nodes, e edges, and p connected components, the Complexity is defined as [4]:

V(G) = e – n + 2p

 

An alternative formulation is to use a graph in which each exit point is connected back to the entry point. In this case, the graph is said to be strongly connected, and the Cyclomatic Complexity of the program is equal to the cyclomatic number of its graph (also known as the first Betti number), which is defined as [10]:

V(G) = e − n + p

 

The program begins executing at the red node, then enters a loop (group of three nodes immediately below the red node). On exiting the loop, there is a conditional statement (group below the loop), and finally the program exits at the blue node, shown as a strongly-connected control flow graph, for calculation via the alternative method. For this graph, E = 10, N = 8 and P = 1, so the Cyclomatic Complexity of the program is 3 [14].  Halstead Metrics for Software Testing was introduced by Maurice Howard Halstead in 1977. These metrics are computed statically, without program execution. Halstead's theory of software science is one of "the best known and most thoroughly studied . . . composite measures of software complexity using a set of primitive measures that may be derived after code is generated or estimated once design is complete [3]. Table 2  lists a sample of product metrics for object oriented approach and other applications.

 

Figure 3: Control Flow Graph of a Simple Program

 

Table 2: A Sample of Product Metrics

Metric

Meaning

Reliability

Probability of failure of a software product with respect to a professional  profile in a environment

Defect density

Number of defects per KLOC.

Defect severity

Distribution of defects by their level of severity.

Test coverage

Fraction of testable items, e.g. basic blocks, covered. also a metric for test adequacy or goodness of tests.

Cyclomatic complexity

Measures complexity of a program based on its CFG.

Weighted methods

nn=1 ci, ci is the complexity of method, i in the class under per class consideration.

Class coupling

Measures the number of classes to which a given class is coupled.

Response set

Set of all methods that can be invoked, directly and indirectly, when a message is sent to object .

Number of Children

Number of immediate descendants of a class in the class hierarchy.

 

2.5 Reliability:

Reliability of a product specifies the probability of failure free operation of that product for a given time duration [1]. Software Reliability is not a direct function of time. Reliability is the main product metric for the testing activity which often depends considerably on the consistency, accuracy, quality of testing. Hence, by accessing reliability, the quality of software can be judged. Alternatively, reliability estimation can be used to decide whether enough testing has been done. The reliability models being used by the project manager to decide when to stop testing [3]. Many models have been proposed for software reliability assessment. To use a model for given software system, data is needed to estimate the reliability of the software. These models attempt to describe the occurrence of defects as a function of time, allowing one to define the reliability, and mean time to failure, MTTF which is defined as:

Mean Time to Failure = Total Test Execution Time/Total number of Failures during Test [15].

One example is the model described by Musa, which, like most others of this type, makes four basic assumptions Test inputs are random samples from the input environment, all software failures are observed, Failure intervals are independent of each other, and Times between failures are exponentially distributed [16].

 

2.6 Function Points

Alan Albrecht while working for IBM recognized the problem in size measurement in the 1970’s, and developed a technique which is called Function Point analysis which appeared to be a solution to the size measurement problem. It measures functionality from user point of view, that is, on the basis of what the user requests and receives in return [1]. Function point is one of the most widely used measure of software size. Project manager uses project metrics to plan and execute life cycle activities common across project [17]. FP is calculated after determining number of inputs, outputs, queries, files and interfaces. These numbers are further valued higher by multiplying by weighting factor based on whether the inputs are simple, average or complex.

FP = count total (F) * [0.65 + 0.01 * ∑ Fi ]

Where Fi is complexity adjustment value for each 14 characteristics (i=1, 14).

 

2.7 Test Coverage Metrics

It is a measure used in software testing, defined as the extent to which testing covers the product’s complete functionality that inspects the code of a program therefore a form of White Box Testing. This technique was amongst the first technique invented for systematic software testing. The first published reference was by Miller and Maloney [18] in Communications of the ACM in 1963. This metric is an indication of the completeness of the testing, used as a criterion to stop testing. It does not indicate anything about the effectiveness of testing. To measure how well the program is exercised by a test suite, coverage criteria used are function coverage, statement coverage, decision coverage/branch coverage, condition coverage or predicate coverage, modified condition/decision Coverage (MC/DC), path coverage. Combined with different code coverage methods, the aim is to develop a rigorous, yet manageable, set of regression tests. As one might expect, there are classes of software that cannot be feasibly subjected to these coverage tests, though a degree of coverage mapping can be approximated through analysis rather than direct testing. There are some sorts of defects which are affected by such tools. In particular, some race conditions or similar real time sensitive operations can be masked when run under code coverage environments; and conversely, some of these defects may become easier to find as a result of the additional overhead of the testing code. Code coverage may be regarded as a more up-to-date incarnation of debugging in that the automated tools used to achieve statement and path coverage are often referred to as “debugging utilities”. These tools allow the program code under test to be observed on screen while the program is executing; additionally, commands and keyboard function keys are available to allow the code to be “stepped” through literally line by line.

 

2.8 Traceability Metrics

This is the measure describing how well traceability is performed from original allocated requirements to software through design, coding and testing. Traceability metrics ought to consider both directions, i.e. from individual requirements to test results, and from test results to individual requirements, that maps defects with test cases and test cases with functional specifications and functional specifications with business requirements in order to trace back the defects [19].It is the ability to determine that each feature has a source in requirements and each requirement has a corresponding implemented feature. This is useful in assessing test coverage details. The Traceability Matrix is a very good document that gives the complete overview / confidence on the end-to-end activities of the SDLC. But it needs to be done carefully as there is a need to capture a lot of complex data at all stages of an SDLC. Using the Traceability Matrix all the User Requirements are properly documented and respectively addressed well in the: Functional Requirement Document, Design Documents (HLDD and LLDD), Coding and Unit Testing, System Test Plans, Test Execution Reports. So precisely, Traceability Matrix gives end-to-end visibility of the requirements. It is a proof of document to ensure that all the specifications are been tested and the application is bug free [20].The progress of the testing effort with the help of traceability metrics can be viewed with the help of chart as shown in Table 3 [6]:

 

Table 3: Testing Effort

 

Requirement1

Requirement 2

  Requirement 3

Etc.

Test 1

*

 

 

 

Test 2

*

*

 

 

Test 3

*

 

 

 

 

3. ANALYSIS OF VARIOUS SOFTWARE TSETING METRICS:

A survey was conducted among Test engineers, Test analysts, Test leads and Test managers in almost 100 software companies which include Infosys, L&T, HCL, HP, ANVISH, Lenovo, Accenture, IBM etc. These organizations involve more than 500 employees respectively. Test Engineers interviewed are trained in functional testing. These organizations provide different software services to its clients. First software is developed based upon the requirements of the client then there is separate team for testing where developed software is tested properly. Survey interviews were conducted face to face, over the telephone, via facsimile or email attachment and a performance software testing mechanism on the software being developed. In all cases, printed or verbal explanatory notes were provided to respondents to ensure consistent interpretation of the terminologies and questions in the questionnaire. It is evident from Figure 4 that 40% test managers think that during Process Metrics, effort required in the process has the major effect on software testing but 30% think that number of defects found during testing have the major effect on software testing. 20% test engineers prefer time to produce the product as a major factor on software testing and remaining 10% prefer effectiveness of defect removal during development.

 

Figure 4: Frequently used Process Metrics

40% test leads prefer time to complete system testing as a main factor of Organization Metrics in software testing, and 30% favour testing cost per KLOC. 15% believe in number of defects after product release and rest 15% goes with deliverable schedule slippage as shown in Figure 5.

Figure 5: Frequently used Organization Metrics

 

In Product Metrics,45%prefer Reliability as major factor which effect software testing. Defect density is preferred by 35% testers and 10% favors Cyclomatic Complexity and rest 10% prefers Test Coverage as shown in Figure 6.

 

Figure 6: Frequently used Product Metrics

 

s depicted in Figure 7, 50% Senior test engineers prefer number of software developers as a major factor in Project Metrics which effect software testing, 20% prefer staffing pattern over the life cycle of the software, 15% prefer cost and schedule as the factor that have main effect on software testing and rest 5% goes with productivity.

 

Figure 7: Frequently used Project Metrics

 

From the analysis, it is found that 36% test manager prefer Product Metrics for software testing, 25% with Process Metrics, 20% Project Metrics and 19% Organization Metrics, depicted in Figure 8.

 

Figure 8: Frequently Used Category of Testing Metrics

 

It is evident from Figure 9 that 20% test analysts support Traceability Metrics (T.M.), 18% favors Reliability, 16% Cyclomatic Complexity (C.COMP), and 14% with Defect Removal Efficiency (D.R.E.), 12% Function Points (F.P.), 10% Test Coverage (T.C), 8% Line of Code (L.O.C), and 3% Halstead Metrics (H.M.).

 

Figure 9: Frequently Used Software Testing Metrics

In terms of efficiency, 30% test engineers give preference to Reliability in software testing, and 22% to Traceability Metrics (T.M), 20% to Defect Removal Efficiency and 19% with Function Points (F.P) as shown in Figure 10.

 

Figure 10: Software Testing Metrics: In Terms of Efficiency

4. CONCLUSION:

The analysis of data reveals that 70% test managers prefer Traceability metrics. From survey it is the frequently used software testing metrics in most of the software companies. It is easy to understand and applicable on the software, so favorable among software professionals. It is most prevalent and important testing metrics used in organizations. The least used testing metrics found in empirical investigation is Halstead metrics. It is complex measure to calculate and time consuming; organizations do not prefer it as very less testers have knowledge about it.

 

Effective management of any process requires quantification, measurement, and modeling. Measurement provides the most appropriate information to ensure consistency and completeness in the quest for goal attainment. Software metrics provide a quantitative basis for the development and validation of models of the software development process. Testing Metrics are used to improve software productivity and quality. This introduces the most commonly used software metrics proposed and reviews their use in constructing models of the software development process. Although current metrics and models are certainly inadequate, a number of organizations are achieving promising results through their use. Software Testing Metrics have rarely been used in any regular, methodical fashion. Recent results indicate that the conscientious implementation and application of a software metrics program can help achieve  better management results, both in the short run (for a  given project) and in the long run (improving  productivity on future projects). Most software metrics cannot meaningfully be discussed in isolation from such metrics programs. Better use of existing metrics and development of improved metrics appear to be important factors in the resolution of the software crisis. By empirical investigation, it has been found that theoretical study correlates with the empirical study. It is very difficult to compare software testing metrics among themselves because each testing metric has its own advantage and disadvantage. Each of them is unique in itself to provide quality of software.

 

REFERENCES:

1        Jalote Pankaj, “An integrated approach to software engineering”, Narosa, New Delhi, 2004.

2        Ripasi Tabor, “Software Testing- State of the Art and Current Research Challenges” 5th International  Symposium on Applied Computational Intelligence and informatics, pages 47-50, May 28-29 2009.

3        Pressman R. S., “Software Engineering: A Practioner’s Approach”, McGraw-Hill, New York, 2001.

4        Aggarwal K. K. & Singh Yogesh, “Software Engineering”, New Age International, New Delhi, 2005.

5        Gray box testing. http://www.testinggeek.com/index.php/testing-types/system-Knowledge/51-grey-box-testing.

6        Kaner Cem, Senior Member, IEEE, and Walter P. Bond, “Software Engineering Metrics: What Do They    Measure and How Do We Know?” 10TH International Software Metrics Symposium, 2004.

7        Pusala Ramesh, “Operational Excellence through Efficient Software Testing   Metrics”, Infosys Tech.  Ltd, August 2006. www.infosys.com/IT-servi3ces/independent-validation-services/white-papers/operational- excellence.pdf.

8        Wasif Afzal and Richard Torkar, “Incorporating Metrics in an Organizational Test Strategy” IEEE International Conference on SoftwareTesting Verification and Validation Workshop, 2008

9        Suvarna Vinod Kumar,“ Challenges of managing a Testing project: A White Paper”. www.edistalearning.com/.../Challenges_Testing_Project.pdf.

10      Project Metrics.www.aivosto.com/project/help/pm-loc.html .

11      Jasmine K.S., and Vasandha R., “DRE - A Quality Metric for Component based software      products”, World Academy of Science, Engineering and Technology 34,2007, pages 48-51.

12      Emam  K. El, “A Methodology for Validating Software Product Metrics”, Ottawa, Ontario, Canada, National Research Council of Canada, June2000.

13      MathurAditya P.,Foundation of Software Testing, Dorling Kindersley Pvt. Ltd., 2008, pages 27-32.

14      Cyclomatic Complexity. http://en.wikipedia.org/wiki/Cyclomatic_complexity.

15      Schneidewind Norman F., “Measuring and Evaluating Maintenance Process using Reliability Risk and Test Metrics”, IEEE Explore, 1989, pages 232-239.

16      Mills Everald E., “Software Metrics”, U.S. Department of Defense, December 1988, pages1-39.

17      JawadekarWaman S., “Software engineering: Principles and practices” Tata.McGraw Hill, 2004.

18      Code Coveragehttp://en.wikipedia,org/wiki/code coverage.

19      Traceability Metrics www.faqs.org/qa/qa-6554.html.

20      Traceability Metricshttp://www.geekinterview.com/question_details/16607.

 

 

 

Received on 11.11.2016       Modified on 27.11.2016

Accepted on 29.11.2016      ©A&V Publications All right reserved

DOI: 10.5958/2349-2988.2017.00003.1

 

Research J. Science and Tech. 2017; 9(1):17-24.